Auto-Discovery of NVEF Word-Pairs in Chinese

نویسندگان

  • Jia-Lin Tsai
  • Gladys Hsieh
  • Wen-Lian Hsu
چکیده

A meaningful noun-verb word-pair in a sentence is called a noun-verb event-frame (NVFE). Previously, we have developed an NVEF word-pair identifier to demonstrate that NVEF knowledge can be used effectively to resolve the Chinese word-sense disambiguation (WSD) problem (with 93.7% accuracy) and the Chinese syllable-to-word (STW) conversion problem (with 99.66% accuracy) on the NVEF related portion. In this paper, we propose a method for automatically acquiring a large scale NVEF knowledge without human intervention. The automatic discovery of NVEF knowledge includes four major processes: (1) segmentation check; (2) Initial Part-of-speech (POS) sequence generation; (3) NV knowledge generation and (4) automatic NVEF knowledge confirmation. Our experimental results show that the precision of the automatically acquired NVEF knowledge reaches 98.52% for the test sentences. In fact, it has automatically discovered more than three hundred thousand NVEF word-pairs from the 2001 United Daily News (2001 UDN) corpus. The acquired NVEF knowledge covers 48% NV-sentences in Academia Sinica Balanced Corpus (ASBC), where an NV-sentence is one including at least a noun and a verb. In the future, we will expand the size of NVEF knowledge to cover more than 75% of NV-sentences in ASBC. We will also apply the acquired NVEF knowledge to support other NLP researches, in particular, shallow parsing, syllable/speech understanding and text indexing.

منابع مشابه

Auto-Generation of NVEF Knowledge in Chinese

Noun-verb event frame (NVEF) knowledge in conjunction with an NVEF word-pair identifier [Tsai et al. 2002] comprises a system to support natural language processing (NLP) and natural language understanding (NLU). In [Tsai et al. 2002a], we demonstrated that NVEF knowledge can be used effectively to resolve the Chinese word-sense disambiguation (WSD) problem with 93.7% accuracy for nouns and ver...

متن کامل

Applying an NVEF Word-Pair Identifier to the Chinese Syllable-to-Word Conversion Problem

Syllable-to-word (STW) conversion is important in Chinese phonetic input methods and speech recognition. There are two major problems in the STW conversion: (1) resolving the ambiguity caused by homonyms; (2) determining the word segmentation. This paper describes a noun-verb event-frame (NVEF) word identifier that can be used to solve these problems effectively. Our approach includes (a) an NV...

متن کامل

Word Sense Disambiguation and Sense-Based NV Event Frame

Word sense is ambiguous in natural language processing (NLP). This phenomenon is particularly keen in cases involving noun-verb (NV) word-pairs. This paper describes a sense-based noun-verb event frame (NVEF) identifier that can be used to disambiguate word sense in Chinese sentences effectively. A knowledge representation system (the NVEF-KR tree) for the NVEF sense-pair identifier is also pro...

متن کامل

Applying Meaningful Word-Pair Identifier to the Chinese Syllable-to-Word Conversion Problem

Syllable-to-word (STW) conversion is a frequently used Chinese input method that is fundamental to syllable/speech understanding. The two major problems with STW conversion are the segmentation of syllable input and the ambiguities caused by homonyms. This paper describes a meaningful word-pair (MWP) identifier that can be used to resolve homonym/segmentation ambiguities and perform STW convers...

متن کامل

Auto-extracting Paraphrases of Letter-word Phrases in Live Texts

In this paper we will discuss the Auto-extraction of paraphrases of letter-word phrases in live Chinese texts. The paper discusses the modes of conventional dictionaries firstly, and then gives the principles of paraphrase of letter-word phrases; with an analysis of the examples of letter-word phrases paraphrases secondly, and then gives their formalized denotations and presents an auto-recogni...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

متن کامل
عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003